Project 4: Emotions of JMU¶

Understanding the Place Where We Study¶

Madison Fernandez, Kate Dunham, Maggie Tewels, Lilia Edwards, Kayli Blankenship¶

📚 Table of Contents¶

📋 Getting Started¶

  • Instructions
  • Part 1: Research Overview
    • 1.1 Corpus Description
    • 1.2 Hypothesis

🔍 Data Exploration¶

  • Part 2: Exploratory Data Analysis
    • 2.1 Visualization 1
    • 2.2 Visualization 2

🧹 Data Processing¶

  • Part 3: Data Cleaning and Refinement
    • 3.1 Toponym Misalignment Analysis
    • 3.2 Toponym Refinement
    • 3.3 Revised Map
    • 3.4 Map Customization

🗺️ Spatial Analysis¶

  • Part 4: Spatial Comparison
    • 4.1 JMU Spatial Distribution
    • 4.2 Spatial Analysis

😊 Sentiment Analysis¶

  • Part 5: Sentiment Analysis Comparison

⏰ Time Series Analysis¶

  • Part 6: Time Series Animation Analysis

📝 Final Report¶

  • Part 7: Conclusion and Future Research

Part 1: Research Overview¶

Introduction¶

This project compares two Reddit corpora, one from James Madison University (JMU) and one from the University of Virginia (UVA), to understand how each student community discusses place and expresses sentiment toward different types of locations. By examining spatial patterns and sentiment trends across both datasets, we aimed to uncover how students at the two institutions talk about place and whether they engage with different types of locations in their conversations. After reviewing both corpora, we found that JMU’s posts mainly revolve around dorms, dining, campus buildings, and everyday life in Harrisonburg, Virginia, while UVA’s posts frequently reference international locations and discussions that extend into global or political contexts. Based on these differences, we hypothesized that JMU students would primarily discuss local, on-campus places with more positive sentiment, while UVA students would reference global or politically significant places more often and express more varied sentiment about them.

Our exploratory analysis in Voyant supported this pattern, with global terms appearing far more often in the UVA corpus and campus-related words dominating JMU’s data. Spatial mapping further reinforced this divide. UVA posts were linked to locations like Israel, Palestine, and Gaza, while the locations discussed by JMU students were clustered on campus. Sentiment mapping showed similar trends, with JMU expressing positive sentiment toward campus spaces and UVA showing mixed sentiment toward globally referenced places. Overall, these findings consistently supported our hypothesis that two communities discuss place in notably different ways.

1.1 Corpus Description¶

The JMU corpus mainly contains posts and comments focused on on-campus life. Students often used spaces like dorms, dining halls, and the campus. Notably, there were frequent references to Harrisonburg and specific campus buildings. The UVA corpus contains many posts discussing global events and international locations more often than in the JMU corpus. Even when talking about Virginia, many of the discussions were related to local politics.

1.2 Hypothesis¶

We hypothesize that JMU students would primarily discuss local, on-campus spaces with a positive sentiment, while UVA students would reference global or politically significant places more often and express a varied sentiment about them.

Part 2: Exploratory Data Analysis¶

Through our research on Voyant, we aim to identify frequency trends and patterns from the Reddit posts that support our hypothesis. In particular, we hope to discover that the average sentiment score of JMU students discussing campus buildings on Reddit is higher than that of UVA students, and that the average sentiment score of UVA students discussing global spaces is higher than that of JMU students.

2.1 Visualization 1: Frequency Trends of “Global,” “Israel,” “Gaza,” and “Palestine” in UVA and JMU Reddit Posts¶

This Trends chart visualization shows the frequency of the words “global”, “Israel”, “Gaza”, and “Palestine” used by the UVA and JMU Reddit pages. We chose these words because they consistently came up as we cleaned our data.

✍️ Visualization Analysis¶

This visual confirms our hypothesis because UVA’s word frequency is significantly higher than JMU’s, indicating that UVA students discuss global spaces more often than JMU students

2.2 Visualization 2: Frequency Trends of “Campus,” “Dorm,” and “Hall” in UVA and JMU Reddit Posts¶

This Trends chart visualization shows the frequency of the words “campus,” “dorm,” and “hall” within UVA and JMU Reddit posts, all of which relate to on-campus living.

This visualization confirms our hypothesis because JMU’s word frequency is higher than UVA's in this chart, indicating that they discuss on-campus topics on their Reddit posts more often than UVA students.

Visualization 3: Freuqency of Words Used in UVA Reddit Posts¶

This cirrus chart visualization shows the frequency of keywords used in UVA Reddit posts in a word cloud format. The larger words represent a higher frequency, and smaller words represent a lower frequency.

Visualization Analysis¶

This visualization helps to confirm our hypothesis because it shows the high frequency of spaces outside of UVA, including Israel, America, United States, and Palestine.

Part 3: Data Cleaning and Refinement¶

In [32]:
# =============================================================================
# SETUP: Import Libraries and Load Data
# =============================================================================
# This cell sets up all the tools we need for spatial sentiment analysis

# Force reload to pick up any changes to data_cleaning_utils
# This ensures we get the latest version of our custom functions
import importlib
import sys
if 'data_cleaning_utils' in sys.modules:
    importlib.reload(sys.modules['data_cleaning_utils'])

# Core data analysis library - like Excel but for Python
import pandas as pd

# Import our custom functions for cleaning and analyzing location data
from data_cleaning_utils import (
    clean_institution_dataframe,      # Standardizes and cleans location data
    get_data_type_summary,            # Shows what types of data we have
    get_null_value_summary,           # Identifies missing data
    create_location_counts,           # Counts how often places are mentioned
    create_location_sentiment,        # Calculates average emotions by location
    create_time_animation_data,       # Prepares data for animated time series
)

# Interactive plotting library - creates maps and charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as pyo

# =============================================================================
# CONFIGURE PLOTLY FOR HTML EXPORT
# =============================================================================
# Configure Plotly for optimal HTML export compatibility

# Method 1: Set renderer for HTML export (use 'notebook' for Jupyter environments)
pio.renderers.default = "notebook"

# Method 2: Configure Plotly for offline use (embeds JavaScript in HTML)
pyo.init_notebook_mode(connected=False)  # False = fully offline, no external dependencies

# Method 3: Set template for clean HTML appearance
pio.templates.default = "plotly_white"

# Method 4: Configure Plotly to include plotly.js in HTML exports
import plotly
plotly.offline.init_notebook_mode(connected=False)



# Load the cleaned JMU Reddit data (already processed and ready to use)
# This contains: posts, locations, coordinates, sentiment scores, and dates
df_jmu = pd.read_pickle("assets/data/jmu_reddit_geoparsed_clean.pickle")
In [33]:
# =============================================================================
# LOAD YOUR INSTITUTION'S DATA
# =============================================================================
# Replace the group number and institution name with your assigned data

# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number (e.g., "group_1", "group_2", etc.)
# Replace "UNC_processed.csv" with your institution's file name
df_institution = pd.read_csv("group_data_packets/group_3/python/UVA_processed.csv")
In [34]:
# =============================================================================
# CREATE RAW LOCATION MAP (Before Cleaning)
# =============================================================================
# This shows the "messy" data before we fix location errors
# You'll see why data cleaning is essential!

# STEP 1: Count how many times each place is mentioned
# Group identical place names together and count occurrences
place_counts = df_institution.groupby('place').agg({
    'place': 'count',           # Count how many times each place appears
    'latitude': 'first',        # Take the first latitude coordinate for each place
    'longitude': 'first',       # Take the first longitude coordinate for each place
    'place_type': 'first'       # Take the first place type classification
}).rename(columns={'place': 'count'})  # Rename the count column for clarity

# STEP 2: Prepare data for mapping
# Reset index makes 'place' a regular column instead of an index
place_counts = place_counts.reset_index()

# Remove any places that don't have valid coordinates (latitude/longitude)
# This prevents errors when trying to plot points on the map
place_counts = place_counts.dropna(subset=['latitude', 'longitude'])

# STEP 3: Create interactive scatter map
# Each dot represents a place, size = how often it's mentioned
fig = px.scatter_map(
    place_counts,                    # Our prepared data
    lat='latitude',                  # Y-coordinate (north-south position)
    lon='longitude',                 # X-coordinate (east-west position)
    size='count',                    # Bigger dots = more mentions
    hover_name='place',              # Show place name when hovering
    hover_data={                     # Additional info in hover tooltip
        'count': True,               # Show mention count
        'place_type': True,          # Show what type of place it is
        'latitude': ':.4f',          # Show coordinates with 4 decimal places
        'longitude': ':.4f'
    },
    size_max=25,                     # Maximum dot size on map
    zoom=4,                          # How zoomed in the map starts (higher = closer)
    title='Raw Location Data: Places Mentioned in UVA Reddit Posts',
    center=dict(lat=35.5, lon=-80)   # Center map on North Carolina for UNC
)

# STEP 4: Customize map appearance
fig.update_layout(
    map_style="carto-positron",      # Clean, light map style
    width=800,                       # Map width in pixels
    height=600,                      # Map height in pixels
    title_font_size=16,              # Title text size
    title_x=0.5                      # Center the title
)


# Configure for HTML export compatibility
fig.show(config={'displayModeBar': True, 'displaylogo': False})

3.1 Toponym Misalignment Analysis¶

Our data set contained several notable toponym misalignments that required manual review. Many place names in the UVA corpus were assigned to multiple, conflicting coordinates, because they were interpreted as global locations, when they might’ve just been campus locations. For example, “Rotunda” appeared many times in the data set. A couple of times it was correctly mapped to the Rotunda on UVA’s campus in Charlottesville, VA, but more times than not it was incorrectly placed in the Rotunda, Romania. Another major misalignment came from posts referencing global issues. Words like “Palestine”, “Israel”, and “Gaza” were mapped correctly at times, but in several cases appeared with inconsistent coordinates or coordinates that were overly broad and simplified to a whole number.

3.3 Revised Map¶

Some of the major fixes we made to the map include ruling out false positives, adding multiple locations for posts that mentioned multiple areas, and confirming the accuracy of locations. Some places that continuously showed up in the dataset were The Rotunda, a building at UVA, The Lawn, a grassy common area on campus, and Scott Stadium. Global issues seemed to be the main concern of many of the posts, with countries other than the United States continuously being mentioned. Some places that surprised us are Israel, Palestine, and Gaza because we didn’t think Reddit would be a place where people talk about global issues.

In [35]:
# =============================================================================
# LOAD CLEANED DATA
# =============================================================================
# Load the CSV file you manually cleaned in Google Sheets

# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number
# Replace "UNC_processed_clean.csv" with your institution's cleaned file
df_institution_cleaned = pd.read_csv(
    "group_data_packets/group_3/python/UVA_processed_clean.csv"
)
In [36]:
# =============================================================================
# APPLY DATA CLEANING FUNCTIONS
# =============================================================================
# Use our custom function to standardize the cleaned data

# Apply the cleaning function to standardize data types and handle missing values
# This function ensures all datasets have the same format for consistent analysis

df_institution_cleaned = clean_institution_dataframe(df_institution_cleaned)

# Display first few rows to verify the cleaning worked properly
# This shows the structure and sample content of your cleaned data
df_institution_cleaned.head()
DataFrame cleaned successfully!
Out[36]:
school_name unique_id date sentences roberta_compound place latitude longitude revised_place revised_latitude revised_longitude place_type false_positive checked_by
0 UVA UVA_2178 2024-05-05 22:29:16 These police are decked out like they are depl... -0.616072 Al Fallūjah 33.34913 43.78599 Al Fallūjah 33.34913 43.78599 Unknown True Kate Dunham
1 UVA UVA_5710 2025-06-28 13:34:45 Despite his own Ivy League education, Vance wr... -0.107961 Appalachia 37.94632 -80.21925 Appalachia 37.94632 -80.21925 Unknown True Maggie Teweles
2 UVA UVA_798 2021-05-08 23:02:51 No need to pay, I added the world zip file in ... 0.233095 Earth 0.00000 0.00000 Earth 0.00000 0.00000 Unknown True Kayli Blankenship
3 UVA UVA_2004 2024-05-04 22:37:55 Take back the Earth. 0.013999 Earth 0.00000 0.00000 Earth 0.00000 0.00000 Unknown True Kayli Blankenship
4 UVA UVA_11794 2025-05-04 21:25:39 Who do they think is getting them to Mars? -0.015304 Mars 54.51210 45.68830 Mars 54.51210 45.68830 Unknown True Kayli Blankenship

3.4 Map Customization¶

  
In [37]:
# =============================================================================
# CREATE CLEANED LOCATION MAP (After Manual Corrections)
# =============================================================================
# This map shows your data AFTER you fixed the location errors
# Compare this to the raw map above to see the improvement!

# STEP 1: Count occurrences using CLEANED/CORRECTED location data
# Now we use 'revised_place' instead of 'place' - these are your corrections!
place_counts = (
    df_institution_cleaned.groupby("revised_place")  # Group by corrected place names
    .agg(
        {
            "revised_place": "count",        # Count mentions of each corrected place
            "revised_latitude": "first",     # Use corrected latitude coordinates
            "revised_longitude": "first",    # Use corrected longitude coordinates
            "place_type": "first",           # Keep place type classification
        }
    )
    .rename(columns={"revised_place": "count"})  # Rename count column for clarity
)

# STEP 2: Prepare data for mapping
place_counts = place_counts.reset_index()  # Make 'revised_place' a regular column

# Remove places without valid corrected coordinates
place_counts = place_counts.dropna(subset=["revised_latitude", "revised_longitude"])

# STEP 3: Create the cleaned location map
fig = px.scatter_map(
    place_counts,
    lat="revised_latitude",          # Use corrected Y-coordinates
    lon="revised_longitude",         # Use corrected X-coordinates
    size="count",                    # Dot size = mention frequency
    hover_name="revised_place",      # Show corrected place name on hover
    hover_data={
        "count": True,               # Show how many mentions
        "place_type": True,          # Show place category
        "revised_latitude": ":.4f",   # Show corrected coordinates
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Maximum dot size
    title="Cleaned Location Data: Places Mentioned in UVA Reddit Posts",
    zoom=0.5,                          # 📝 TO DO: Adjust zoom level for your region
    center=dict(lat=34, lon=-20), # 
)

# STEP 4: Customize map appearance
fig.update_layout(
    map_style="carto-positron",      # Clean, readable map style
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})

Map Customization Insights¶

Prior to our map customizations, it was difficult to compare the frequency of global spaces on our maps because they were set at a higher zoom with JMU as the center of our view. By decreasing our zoom level and redefining our focal viewpoint to be in the middle of the Atlantic Ocean, we were able to see a higher frequency of global topics discussed in UVA’s Reddit posts compared to JMU’s. Similarly, changing the color palette from “Plotly” to “Set 2” in the spatial comparison maps helped to visually distinguish the place types from one another, showing a higher frequency of cities and states discussed in UVA’s Reddit posts.

Part 4: Spatial Comparison¶

4.1 JMU Spatial Distribution¶

In [38]:
# =============================================================================
# JMU SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a filtered map showing only certain types of places for JMU

# STEP 1: Use custom function to filter and count JMU locations
# This function applies the same filtering to both datasets for fair comparison
JMU_filtered_locations = create_location_counts(
    df_jmu,                          # JMU Reddit data
    minimum_count=2,                 # Only show places mentioned 2+ times
    place_type_filter=['State', 'City', 'Country']  # Only these place types
)


# STEP 2: Create colored scatter map
# Each place type gets a different color to show spatial patterns
fig = px.scatter_map(
    JMU_filtered_locations,
    lat="revised_latitude",
    lon="revised_longitude", 
    size="count",                    # Dot size = mention frequency
    color="place_type",              # Different colors for different place types
    hover_name="revised_place",
    hover_data={
        "count": True,
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,
    zoom=0.5,                          # 📝 TO DO: Adjust to highlight interesting patterns
    title="Cleaned Location Data: Places Mentioned in JMU Reddit Posts",
    center=dict(lat=34, lon=-20), # 📝 
    color_discrete_sequence=px.colors.qualitative.Set2  # Categorical color palette
)

# STEP 3: Customize layout
fig.update_layout(
    map_style="carto-positron", 
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
In [39]:
# =============================================================================
# YOUR INSTITUTION'S SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a comparable map for your institution using identical filtering

# STEP 1: Apply the same filtering to your institution's data
# Using identical parameters ensures fair comparison with JMU
institution_filtered_locations = create_location_counts(
    df_institution_cleaned,          # Your cleaned institution data
    minimum_count=2,                 # Same minimum as JMU map
    place_type_filter=["State", "City", "Country"]  # Same place types as JMU
)


# STEP 2: Create matching visualization
# Keep all settings the same as JMU map for direct comparison
fig_institution_cleaned = px.scatter_map(
    institution_filtered_locations,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",
    color="place_type",              # Same color coding as JMU map
    hover_name="revised_place",
    hover_data={
        "count": True,
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Same size scale as JMU
    zoom=0.5,                          # 📝 TO DO: Adjust for your region
    title="Cleaned Location Data: Places Mentioned in UVA Reddit Posts",  
    center=dict(lat=34, lon=-20), # 📝 TO DO: Center on your region
    color_discrete_sequence=px.colors.qualitative.Set2,  # Same colors as JMU
)

# STEP 3: Apply identical layout settings
fig_institution_cleaned.update_layout(
    map_style="carto-positron",      # Same style as JMU map
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)



# Display with HTML export configuration
fig_institution_cleaned.show(config={'displayModeBar': True, 'displaylogo': False})

4.2 Spatial Analysis¶

Spatially, James Madison students discussed local topics; meanwhile, conversations on Reddit from UVA students focused on global topics. The data revealed that University of Virginia students discuss locations outside of their campus, such as Gaza, Palestine, and Israel, significantly more than students at JMU. On the contrary, students at James Madison discussed on-campus spaces, like dorms and academic buildings, more than locations outside of the United States. JMU students didn’t predominantly discuss locations outside of Harrisonburg, but when they did, they discussed locations on the East Coast, specifically New Jersey.

Part 5: Sentiment Analysis Comparison¶

In [40]:
# =============================================================================
# JMU SENTIMENT ANALYSIS MAP
# =============================================================================
# Shows the EMOTIONAL tone of how JMU students talk about different places
# Red = negative emotions, Green = positive emotions

# STEP 1: Calculate average sentiment scores by location
# This function groups identical locations and averages their sentiment scores
df_jmu_sentiment = create_location_sentiment(
    df_jmu,                          # JMU Reddit data with sentiment scores
    minimum_count=5,                 # Only places mentioned 2+ times (for reliability)
    place_type_filter=None           # Include all place types for comprehensive view
)


# STEP 2: Create sentiment visualization map
# Color represents emotional tone: Green = positive, Red = negative, Yellow = neutral
fig_sentiment = px.scatter_map(
    df_jmu_sentiment,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",                    # Larger dots = more mentions (more reliable sentiment)
    color="avg_sentiment",           # Color intensity = emotional tone
    color_continuous_scale="spectral", # Red-Yellow-Green scale (Red=negative, Green=positive)
    hover_name="revised_place",
    hover_data={
        "count": True,               # How many posts contributed to this sentiment
        "avg_sentiment": ":.3f",     # Average sentiment score (3 decimal places)
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,
    zoom=0.5,                          # 📝 TO DO: Adjust to focus on interesting patterns
    title="Average Sentiment by Location in JMU Reddit Posts",
    center=dict(lat=34, lon=-20),  
)

# STEP 3: Customize layout for sentiment analysis
fig_sentiment.update_layout(
    map_style="carto-darkmatter",      # Clean background to highlight sentiment colors
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)


# Display with HTML export configuration
fig_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
In [41]:
# =============================================================================
# YOUR INSTITUTION'S SENTIMENT ANALYSIS MAP
# =============================================================================
# Compare emotional patterns between your institution and JMU

# STEP 1: Calculate sentiment for your institution using identical methods
institution_sentiment = create_location_sentiment(
    df_institution_cleaned,          # Your cleaned institution data
    minimum_count=2,                 # Same minimum as JMU (ensures fair comparison)
    place_type_filter=None           # Same filter as JMU (include all place types)
)


# STEP 2: Create matching sentiment visualization
# Use identical settings to JMU map for direct comparison
fig_institution_sentiment = px.scatter_map(
    institution_sentiment,
    lat="revised_latitude",
    lon="revised_longitude",
    size="count",
    color="avg_sentiment",
    color_continuous_scale="Spectral", # Same color scale as JMU map
    hover_name="revised_place",
    hover_data={
        "count": True,
        "avg_sentiment": ":.3f",
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f",
    },
    size_max=25,                     # Same size scale as JMU
    zoom=0.5,                          # 📝 TO DO: Adjust for your region
    title="Average Sentiment by Location in UVA Reddit Posts",  
    center=dict(lat=34, lon=-20),  
)

# STEP 3: Apply identical layout for comparison
fig_institution_sentiment.update_layout(
    map_style="carto-darkmatter",      # Same background as JMU map
    width=800, 
    height=600, 
    title_font_size=16, 
    title_x=0.5
)



# Display with HTML export configuration
fig_institution_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})

Sentiment Comparison Analysis¶

After our research and creation of sentiment charts, our hypothesis was confirmed. When comparing our sentiment analysis charts, the UVA Reddit data shows a wide geographic spread, with many posts referencing locations outside the United States, particularly in the Middle East. locations display a much broader emotional range, including stronger negative sentiment values. This variation indicates that UVA students engage in more global conversations and use more complex language when discussing global politics. In contrast, JMU’s Reddit posts were overwhelmingly clustered around local, on-campus areas with a relatively positive-to-neutral sentiment. This pattern suggests that JMU students are more inclined to discuss local spaces in a way that reinforces community and belonging, whereas UVA students are more inclined to discuss spaces involved in global politics and current events, leading to a more emotionally diverse sentiment chart.

Part 6: Time Series Animation Analysis¶

Since the data has a time variable, we can also plot changes in place sentiment over time. This can give insight into whether a particular emotion is consistent or if there was a moment when emotions were better or worse about a location.

In [42]:
# =============================================================================
# ANIMATED TIME SERIES: SENTIMENT CHANGES OVER TIME
# =============================================================================
# Watch how places accumulate mentions and sentiment changes over time
# This reveals temporal patterns in student discussions

# STEP 1: Prepare animation data with rolling averages
# This function creates monthly frames showing cumulative growth and sentiment trends
institution_animation = create_time_animation_data(
    df_institution_cleaned,          # Your cleaned institution data
    window_months=2,                 # 3-month rolling average (smooths out noise)
    minimum_count=4,                 # Only places with 2+ total mentions
    place_type_filter=None           # Include all place types (📝 TO DO: experiment with filtering)
)

# STEP 2: Create animated scatter map
# Each frame represents one month, showing cumulative mentions and current sentiment
fig_animated = px.scatter_map(
    institution_animation,
    lat="revised_latitude",
    lon="revised_longitude",
    size="cumulative_count",         # Dot size = total mentions up to this point in time
    color="rolling_avg_sentiment",   # Color = 3-month average sentiment (smoother than daily)
    animation_frame="month",         # Each frame = one month of data
    animation_group="revised_place", # Keep same places connected across frames
    hover_name="revised_place",
    hover_data={
        "cumulative_count": True,    # Total mentions so far
        "rolling_avg_sentiment": ":.3f", # Smoothed sentiment score
        "place_type": True,
        "revised_latitude": ":.4f",
        "revised_longitude": ":.4f"
    },
    color_continuous_scale="Spectral", # Same sentiment colors as static maps
    size_max=30,                     # Slightly larger max size for animation visibility
    zoom=0.5,                          # 📝 TO DO: Adjust zoom for your region
    title="Institution Reddit Posts: Cumulative Location Mentions & Rolling Average Sentiment Over Time",
    center=dict(lat=34, lon=-20),  
    range_color=[-0.5, 0.5]          # Fixed color range for consistent comparison across time
)

# STEP 3: Customize animation settings and layout
fig_animated.update_layout(
    map_style="carto-darkmatter",
    width=800,
    height=600,
    title_font_size=16,
    title_x=0.5,
    coloraxis_colorbar=dict(         # Customize the sentiment legend
        title="Rolling Avg<br>Sentiment",
        tickmode="linear",
        tick0=-0.5,                  # Start legend at -0.5 (most negative)
        dtick=0.25                   # Tick marks every 0.25 points
    )
)

# STEP 4: Set animation timing (in milliseconds)
# 📝 TO DO: Experiment with these values for optimal viewing
fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800    # Time between frames
fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300 # Transition smoothness

# Display with HTML export configuration
fig_animated.show(config={'displayModeBar': True, 'displaylogo': False})

✍️ Time Series Analysis¶

For our time-series animation, we optimized the visuals by changing the color scale from “carto-positron” to “carto-darkmatter” to clearly show sentiment on the map. We also changed the sentiment color continuous scale from “Rdglyn” to “Spectral” in order to visually reflect a wider range of emotions. We increased the minimum count to 4 to create a cleaner display of relevant places, and reduced the window months to 2 to smooth the transition of emotions. The sentiments were largely focused on spaces in Virginia until the end of 2020, when spaces around the United States became more discussed. The sentiments were relatively neutral throughout the time series. Our animation showed an increase in discussions about spaces outside of the US around December of 2023, with negative sentiments attached. Over time, these sentiments became more neutral. These patterns suggest that global spaces are discussed more during times of major global current events, leading to a more negative and complex range of emotions attached, and ultimately confirms our hypothesis.

Part 7: Conclusion and Future Research¶

Our findings consistently supported our original hypothesis: JMU students primarily discuss local, on-campus places, and generally express more positive sentiment toward those locations, while UVA students reference global or politically significant places far more often, with sentiment patterns that vary. These differences highlight how student communities at each university engage space in distinct ways.

The research also faced some limitations. Many posts contained vague place names or slang that required manual interpretation, and automated location selection sometimes resulted in inaccurate coordinates. The two corpora were not perfectly balanced in size or posting frequency, so that could also alter the results. Future research could improve data quality by building a refined list of campus-specific place names and collecting a larger and more evenly distributed sample from both corpora. Adding data, such as post type, could also strengthen the analysis. We could also explore how temporal events shape spatial conversations by comparing posts from different academic years or major global events. Spikes in international references at UVA might align with political events, while JMU’s local focus might shift during major campus changes like new building openings or leadership changes. For example, new presidents or football coaches. Tracking these changes over time would help determine whether the spatial patterns we observed are consistent or situational.

Overall, our findings show that spatial sentiment analysis can offer meaningful insight into how different student communities relate to spaces. The contrast between JMU’s campus-centered discussions and UVA’s globally oriented conversations exemplifies how two nearby universities with similar student populations can still exhibit distinctly different spatial orientations. With more refined lists, better handling of vague lists, and expanded comparisons across time, future research could deepen our understanding of how students connect their lived experiences, concerns, and identities to the spaces they talk about online.